Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams
نویسندگان
چکیده
Detecting and adapting to concept drift makes learning data stream classifiers a difficult task. It becomes even more complex when the distribution of classes in the stream becomes imbalanced. Currently, proper assessment of classifiers for such data is still a challenge, as existing evaluation measures either do not take into account class imbalance or are unable to indicate class ratio changes in time. In this paper, we advocate the use of the area under the ROC curve (AUC) in imbalanced data stream settings and propose an incremental algorithm that uses a sorted tree structure with a sliding window to compute AUC using constant time and memory. Additionally, we experimentally verify that this algorithm is capable of correctly evaluating classifiers on imbalanced streams and can be used as a basis for detecting sudden changes in class definitions and imbalance ratio.
منابع مشابه
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifi...
متن کاملEvolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملAlgorithm to handle Concept Drifting in Data Stream Mining
Data Stream Mining is the evolving field of research. Mining continuous data streams brings unique opportunities but also new challenges. This paper will describe and evaluate the proposed classifier which uses ensemble classifier along with the boosting concept. Adaptive windowing is also used for handling the data stream. Empirical study will show that the proposed classifier takes less memor...
متن کاملAccuracy Updated Ensemble for Data Streams with Concept Drift
In this paper we study the problem of constructing accurate block-based ensemble classifiers from time evolving data streams. AWE is the best-known representative of these ensembles. We propose a new algorithm called Accuracy Updated Ensemble (AUE), which extends AWE by using online component classifiers and updating them according to the current distribution. Additional modifications of weight...
متن کامل